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1 Introduction 



The present state of the art algorithm for lattice QCD with dynamical fermions, used 
in all production runs, is the hybrid Monte Carlo [HH^ (HMC) algorithm (for four stag- 
gered or two Wilson fermions) , or the hybrid molecular dynamics variant of this for fewer 
fermions. In this letter we investigate the effectiveness of the HMC algorithm, both with 
and without fermions, in updating the topological sector. For the former we use four 
flavours of staggered fermions. For the latter case we compare the HMC with over-relaxed 
heat-bath results. 

We find that decorrelating the topology requires a dramatic increase in computer time as 
the quark mass decreases, over and above the increase naively expected. For the smallest 
mass, the algorithm requires days to decorrelate the topology, running on a powerful 
computer (25Gflop APE/QUADRICS). This implies that one cannot generate a set of 
configurations with the correct sampling of the topological sector using brute force. 

The correct sampling of topological mode^ is clearly crucial for quantities such as the 
r( mass, which depend directly on the topology via the explicit breaking of the UkiX) 
symmetry. It can also become relevant for other quantities, for example the proton mass, 
at some level of accuracy. Some attempts to study the topology in the presence of fermions 
have been presented in the literature 



2 Topology 

This investigation began as part of a calculation of the spin content of the proton |]^. 
This can be related to the matrix element of the topological charge density operator q[x) 
between on-shell proton states with small momentum transfer. To this end we prepared 
a large set of configurations using the HMC algorithm. We examined the longest auto- 
correlations in this set, namely the auto-correlation time of the global topological charge 
Q. Any reasonable definition of Q is sufficient to determine the auto-correlations; we used 
the field theoretic definition of Q, measured on cooled p configurations. 

The global topological charge Q is defined as 

-1 

Q = E 04 ^ Qo^2 E W-Tr[H^^(x)H^,(x)]. (1) 
where H^j,(x) is the plaquette in the /iz/ plane. A related quantity is the topological 



^ We regard the topological modes to be those which, in the continuum Umit, determine the 
topological properties of the lattice. 
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susceptibilty x defined as 



X = m/v, (2) 

where V4 is the lattice four-volume. The cooled configurations are obtained from each 
configuration generated by the HMC by minimizing locally the gauge part ||^ of the 
action. 

Note that the conclusions drawn in this letter about the decorrelation of global topological 
charge by the HMC algorithm are independent of the definition of Q used here. It is 
sufficient to assume that the cooling procedure is able to capture the most slowly updated 
modes associated with the 'topological' content of the lattice configurations, a perfectly 
reasonable assumption. Possible systematic errors in this procedure will not change our 
final conclusions. 

As mentioned above, in order to extract physical results related to the topological proper- 
ties, a Monte Carlo simulation must provide a set of configurations covering phase space. 
In particular, the global topological charge must average to zero. So the algorithm must 
be effective in changing topological modes. As we shall see, this efficiency appears unob- 
tainable for values of (3 large enough to have scaling when the quark mass is taken as 
small as currently feasible (eg., m = 0.01). Note that this dramatic slowing down is seen 
for quark masses still far from the chiral limit. 

Although the existence of long auto-correlations has been observed 0,^] in previous studies 
using the HMC algorithm, results on many quantities, some associated with the topol- 
ogy, have been reported. We believe that critical slowing down in the HMC has been 
under-estimated with respect to the topological modes. As we shall show in the following, 
this casts serious doubt on the possibility of performing any study involving topological 
properties in the relevant region of (3 and m. 



3 Results 

We have used the Wilson action for the pure gauge sector, and four flavours of staggered 
fermions. Having four flavours allows the use of an exact algorithm, the so-called $ algo- 
rithm described in detail in reference 0. The molecular dynamics equations of motion are 
integrated in flctitious time r, using the leapfrog integration method, and beginning with 
a half-step in the gauge flelds. The two relevant units for comparing the auto-correlations 
from different simulations are then flctitious time in units of r, and wall clock time. The 
production runs were done using a 16^ x 24 lattice at a coupling (3 = 5.35 for full QCD, 
and a 16^ x 16 lattice at /5 = 6.00 for pure SU(3). 

Let us begin with the results for quark mass m = 0.01. This yields a pion to rho mass 
ratio of iriT^/mp ^ 0.5 0, so we are still quite far from the physical value. The lattice 
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Fig. 1. Time history, in units of molecular dynamics time r, of the topological charge Q for the 
HMC simulation at (3 = 5.35 and m = 0.01 on a 16^ x 24 lattice. 

spacing is a ~ 0.14fm (from nip), which gives a lattice volume of V3 ~ (2.2fm)'^. The 
parameters used for the HMC simulation were 

■nraj = 0.3, 

At = 0.004, ^ ' 

where At is the step size of the molecular dynamic evolution, and Ttraj the length of each 
trajectory. We performed a rather long simulation, with a total length after thermalisation 
of r ~ 500 in units of fictitious time. Fig. |I| shows the Monte Carlo time-history of the 
topological charge, as determined after 25 cooling sweeps of a pseudo-heat-bath algorithm 
at /3 = cx). These data show clearly that the HMC is unable to change the global topolog- 
ical modes efficiently, leading to very long auto-correlations. The value of the topological 
charge got stuck at around Q — — 2, and its value averaged over all configurations gener- 
ated so far is decidedly non-zero: {Q) = —1.7(4). A very rough estimate of the integrated 
auto-correlation time Tq from a blocking analysis of the data gives Tg > 2 x 10^ in units of 
molecular dynamics time. The simulations were performed on the 25 Gfiops APE tower, 
with around 50% efficiency. On this machine Tq ~ 200 corresponds to about three days 
of computer time, a considerable amount. Notice that most simulations at comparable 
values of jS and m presented in the literature have r ~ 100. 

To further investigate the behaviour of the HMC, we have performed simulations with 
larger quark masses, m = 0.035 and m = 0.05, and in the quenched case, which represents 
the large quark mass limit of full QCD. Quenched simulations were performed at (3 = 6.0, 
and here HMC has been compared with one of the best available local algorithms, the 
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Table 1 

Results from, and parameters used for the Hybrid Monte Carlo (HMC) runs, and, in the last 
line, the heat-bath over-relaxed (HBOR) run. The full QCD runs with four staggered fermions 
were performed on a 16^ X 24 lattice, the pure SU(3) runs on a 16^ x 16 lattice. The length of each 
trajectory in fictitious time is rtraj, the total length is rtotai in units of fictitious time and At is 
the step size. The last three columns give the integrated auto-correlation time of Q in units of 
fictitious time and wall clock hours on the 25 Gflops Quadrics, as well as the auto-correlations 
for the plaquette (Tn). For the HBOR algorithm the auto-correlations are given in numbers of 
cycles and wall clock seconds. 
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over-relaxed heat-bath. One cycle of this algorithm consists of 5 microcanonical updates 
followed by a pseudo-heat-bath update. We mention that in pure gauge theory a local ver- 
sion of the HMC can be constructed [|I^ , which performs better than the HMC algorithm, 
but which cannot be extended to fermions. 

In Table |l] we give the parameters used in our HMC simulations and the corresponding 
estimates of the integrated auto-correlation time of the topological charge Tq, given in 
molecular dynamics time units and in CPU hours. Tq has been estimated from a binning 
procedure, as well as directly where the statistics were sufficient to be able to calculate the 
integrated auto-correlation time. In these cases both agree. The binning estimates should 
be regarded as a lower limit for m = 0.01, and with a possible uncertainty of 10-20% for 
all other values. This is sufficient to our purpose. The results for the quenched case must 
be compared with the auto-correlation time of the local over-relaxed algorithm which is 
Tq ^ 5.7 cycles, or Tq ^ 5.2 seconds in CPU time. 
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There are three major resuhs that have emerged from this work: 



(i) In HMC simulations the integrated auto-correlation time of the global topological 
charge decreases as the length of the trajectory increases. If expressed in terms of 
computer time, however, there is a decrease in performance for trajectories longer 
than one unit in molecular dynamics time. The best step size in pure SU(3) seems 
to be one giving an acceptance rate around 70%. 

(ii) The performance of the HMC is already poor for pure gauge theory, i.e. in the limit 
m oo. At /? = 6.0, it turns out to be about two orders of magnitude slower 
in decorrelating Q than local over- relaxed algorithms. For our 'best' set of HMC 
parameters we find that the plaquette decorrelates about 30 times slower than the 
heat-bath over-relaxed algorithm, which agrees with the results of The plaquette 
auto-correlation time in units of r is constant, though. 

For our 'best' parameters in pure SU(3) Q decorrelates 60 times slower with the 
HMC than with the heat-bath over-relaxed. Furthermore the auto-correlation of Q 
is far more sensitive to the choice of parameters than the plaquette, increasing dra- 
matically in units of r, and of course in real time as well, the more the parameters 
differ from optimal. 

(iii) The auto-correlation time Tq rapidly increases with decreasing quark mass, in terms 
of both CPU time (which is also due to critical slowing down in the inversion of 
the fermion matrix) and molecular dynamic time. The increase in real time required 
appears to scale as Tq = 1/m" with a ~ 3 to 5. This is considerably more than 
the a ~ 2.5 expected from using = 1/m as the relevant physical quantity [0 
causing slowing down, a factor 1/m from the matrix inversion, and another ^ 0.5 
from the change in step size and acceptance rate. 

The auto-correlation time for the plaquette, on the other hand, remains similar 
in units of r. So the increase in real time required for the topology is considerably 
more than the increase in real time required for smaller quark masses that one has 
previously learnt to expect []TT],|I3| based on studies of more local correlators like the 
plaquette. 

In our rather long HMC simulation at /5 = 5.35 and m = 0.01 (r ~ 500 in 
molecular dynamic time unit) the sampling of the topological modes appears not to 
be correct, making the determination of x impossible. We cannot even estimate how 
long the run should have been to get At m = 0.035 the sampling of Q turned out 
to be sufficiently long to allow a calculation of x- 



In conclusion we have shown that the HMC algorithm has serious problems decorrelat- 
ing global topogical modes, more serious than those associated with commonly studied 
quantities. For full QCD the algorithm appears to slow down as roughly We stress 

again that this warning must be seriously considered when simulating full QCD. It is 
especially important when studying quantities related to the topology, but may be less 
relevant in the calculation of the mass spectrum (with the exception of, for example, the 
1]' mass), since an effective decoupling from the topological modes is expected. Note also 
that this slowing down is independant of both the number of fermions and type of fermion 
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simulated, as the underlying dynamics of the algorithm are the same as the cases studied 
here. 



It may also be possible to improve the performance of the HMC algorithm at small 
masses beyond gains coming from tuning the trajectory length and step size via the use 
of simulated tempering WM. In this, the quark mass becomes a dynamical variable in 



the simulation. This helps in decorrelating the topological charge both because there are 
more fluctuations in the charge when the mass is large, and because the conjugate gradient 
inversion is faster for larger masses. This is presently under investigation. Finally, we men- 
tion that other algorithms, not based on equations of motion, (for example, Liischer's 
multi-boson algorithm), may perform better with respect to the topological modes than 
the HMC algorithm. This is also being examined. 
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